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programs 

DATE-ISSUED: February 15, 2000 
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Cambridge 
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OTHER PUBLICATIONS 

Joseph A. Fisher, Paolo Faraboschi and Giuseppe Desoli; Hewlett-Packard 
Laboratories Cambridge, 1 Maine Street, Cambridge, MA 02142; "Custom-Fit 
Processors: Letting Applications Define Architectures"; 29th Annual IEEE/ACM 
International Symposium on Microarchitecture; Dec. 2-4, 1996, Paris, France. 

ART-UNIT: 273 

PRIMARY-EXAMINER: Coleman; Eric 
ABSTRACT : 

A CPU having a cluster VLIW architecture is shown which operates in both a high 
instruction level parallelism (ILP) mode and a low ILP mode. In high ILP mode, the 
CPU executes wide instruction words using all operational clusters of the CPU and 
all of a main instruction cache and main data cache of the CPU are accessible to a 
high ILP task. The CPU also includes a mini-instruction cache, a mini-instruction 
register and a mini -data cache which are inactive during high ILP mode. An 
instruction level controller in the CPU receives a low ILP signal, such as an 
interrupt or function call to a low ILP routine, and switches to low ILP mode. In 
low ILP mode, the main instruction cache and main data cache are deactivated to 
preserve their contents. At the same time, a predetermined cluster remains active 
while the remaining clusters are also deactivated. The low ILP task executes 
instructions from the mini -instruction cache which are input to the predetermined 
cluster through the mini -instruction register. The mini -data cache stores operands 
for the low ILP task. The separate mini -instruction cache and mini -data cache along 
with the use of only the predetermined cluster minimizes the pollution of the main 
instruction and data caches, as well as pollution of register files in the 
deactivated clusters, with regard to a task executing in high ILP mode. 

41 Claims, 7 Drawing figures 
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Feb 15, 2000 



DOCUMENT- IDENTIFIER: US 6026479 A 

TITLE: Apparatus and method for efficient switching of CPU mode between regions of 
high instruction level parallism and low instruction level parallism in computer 
programs 

Brief Summary Text (5) : 

High performance VLIW central processing units (CPUs) with multiple functional 
units are designed to obtain improved processing performance by executing code 
which has high instruction level parallelism (ILP) . Clustered VLIW machines (CVLIW) 
further divide the CPU architecture into clusters which each contain one or more 
functional units and a separate register file. Instructions in the code are divided 
into sub- instructions which are input to each cluster and which may be executed in 
parallel . 
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DOCUMENT- IDENTIFIER: US 6631439 B2 

TITLE: VLIW computer processing architecture with on-chip dynamic RAM 
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APPL-NO: 09/ 802324 [PALM] 
DATE FILED: March 8, 2 001 

PARENT -CASE: 

CROSS-REFERENCES TO RELATED APPLICATIONS This applications claims the benefit of 
U.S. Provisional Patent Application Ser. No. 60/187,796, filed on Mar. 8, 2000 and 
entitled " VLIW Computer Processing Architecture with On-Chip Dynamic RAM, " the 
entirety of which is incorporated by reference herein for all purposes. 

INT-CL: [07] GO 6 F 12/ 00 
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FOREIGN PATENT DOCUMENTS 



FOREIGN -PAT -NO PUBN-DATE COUNTRY US-CL 

WO 00/33178 June 2000 WO 

OTHER PUBLICATIONS 



Mitsubishi Electric Corp; Product Specification for Single-Chip CMOS Microcomputer; 
May 1998.* 

Numomura Y et al : "M32R/D-Integrating DRAM and Microprocessor" IEEE Inc. New York, 
US, vol. 17 No. 6, Nov. 1, 1997, pp. 40-48, XP000726003; ISSN: 0272-1732. 
Kozyrakis C E et al: "Scalable Processors in the Billion-Transistor Era: I RAM" 
Computer, IEEE Computer Society, Long Beach., CA, US, US vol. 20, No. 0 Sep. 1, 
1997, pp. 75-78, XP000730003; ISSN: 0018-9162. 

Herrmann Klaus, Hilgenstock Joerg, Pirsch Peter: "Architecture of a Multiprocessor 
System with Embedded DRAM for Large Area Integration" Oct. 8, 1997, IEEE 
International Conference on Innovative Systems in Silicon, Piscataway, NJ, USA; 
XP002179990. 

Aimoto, Yoshiharu et al . ; "A.768GIPS 3.84GB/s 1 W Parallel Image-Processing RAM 
Integrating a 16 Mb DRAM and 128 Processors"; ISSCC96/Session 23 / DRAM / Paper 
SP23.3; 1996 IEEE International Solid-State Circuits Conference; pp. 372-373 and 
476 . 

Bursky, Dave; "Combo RISC CPU and DRAM Solves Data Bandwidth Issues"; Electronic 
Design; Mar. 4, 1996; pp. 67-71. 
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Saulsbury, Ashley, et al . , "Missing the Memory Wall: The Case for Processor/Memory 
Integration"; ACM; 1996; pp. 90-101. 

Shimizu, Toro, et al . ; "A Multimedia 32b RISC Microprocessor with 16 Mb DRAM"; 
ISSCC96/Session 13 / Microprocessors/Paper FP 13.4; 1996 IEEE International Solid 
State Circuits Conference; pp. 216-217 and 448. 

Mitsubishi Electric Corp; Product Specification for Single-Chip 32 -Bit CMOS 
Microcomputer; .COPYRGT. May 1998. 

ART-UNIT: 2187 

PRIMARY -EXAMINER: Nguyen; T. V. 

ATTY- AGENT- FIRM: Townsend and Townsend and Crew LLP 



ABSTRACT : 

A novel processor chip (10) having a processing core (12) , at least one bank of 
memory (14) , an I/O link (26) configured to communicate with other like processor 
chips or compatible I/O devices, a memory controller (20) in electrical 
communication with processing core (12) and memory (14) , and a distributed shared 
memory controller (22) in electrical communication with memory controller (20) and 
I/O link (26) . Distributed shared memory controller (22) is configured to control 
the exchange of data between processor chip (10) and the other processor chips or 
I/O devices. In addition, memory controller (20) is configured to receive memory 
requests from processing core (12) and distributed shared memory controller (22) 
and process the memory request with memory (14) . Processor chip (10) may further 
comprise an external memory interface (24) in electrical communication with memory 
controller (20) . External memory interface (24) is configured to connect processor 
chip (10) with external memory, such as DRAM. Memory controller (20) is configured 
to receive memory requests from processing core (12) and distributed shared memory 
controller (22) , determine whether the memory requests are directed to memory (14) 
on chip (10) or the external memory, and process the memory requests with memory 
(14) on processor chip (10) or with the external memory through external memory 
interface (24) . 

37 Claims, 5 Drawing figures 
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Jul 31, 1997 



DERWENT-ACC-NO: 1997-470440 
DERWENT - WEEK : 2 00255 

COPYRIGHT 2 004 DERWENT INFORMATION LTD 

TITLE: Array bounds checking apparatus - includes comparison elements verifying 
that referenced element is between maximum and minimum array size boundary values 
by comparison 

INVENTOR: JOY, W N; O'CONNOR, J M ; TREMBLAY, M ; OCONNOR, J M 



PATENT-ASSIGNEE : 
ASSIGNEE 

SUN MICROSYSTEMS INC 



CODE 
SUNM 



PRIORITY-DATA: 1996US- 0642248 (May 2, 1996), 1996US-010527P (January 24, 1996) 



PATENT -FAMILY: 
PUB -NO 

□ WO 9727544 Al 

□ DE 69713400 E 

□ EP 976050 Al 

□ JP 2000501217 W 

□ KR 99081958 A 

□ EP 976050 Bl 



PUB -DATE 
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DES I GNATED- STATES : CN JP KR AT BE CH DE DK ES FI FR GB GR IE IT LU MC NL PT SE DE 
FR GB NL SE DE FR GB NL SE 

C I TED - DOCUMENTS : 1 . Jnl . Re f; EP 17670 ; EP 214490 ; EP 535538 / JP 4071050 ; US 
3573855 



APPLICATION-DATA : 
PUB -NO 

WO 9727544A1 
DE 69713400E 
DE 69713400E 
DE 69713400E 
DE 69713400E 
DE 69713400E 



APPL-DATE 

January 23, 1997 

January 23, 1997 

January 23, 1997 

January 23, 1997 



APPL-NO 

1997WO-US01305 

1997DE-0613400 

1997EP-0904011 

1997WO-US01305 

EP 976050 

WO 9727544 



DESCRIPTOR 



Based on 
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1997EP-0904011 
1997WO-US01305 
WO 9727544 
1997JP-0527086 
1997WO-US01305 
WO 9727544 
1997WO-US01305 
1998KR-0705676 
WO 9727544 
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WO 9727544 
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Based on 
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Based on 



INT-CL (IPC) : G06 F 11/28; G06 F 12/14 



RELATED -ACC- NO: 1997 -3 93 871 ; 1997 -3 93873 ; 1997-47043 9 



ABSTRACTED- PUB -NO: EP 976050B 
BASIC -ABSTRACT: 

The array bounds checking apparatus includes an associative memory element which is 
configured to store and retrieve array size values associated with array access 
instructions. Each array access instruction references a value of an element in the 
array. 

A comparison element is coupled to an output of the associative memory element. 
This comparison element compares a given maximum array size with the value of the 
referenced element so as to provide a maximum violation signal. A second comparison 
element compares a given minimum array size and the value of the referenced element 
so as to provide a minimum violation signal. 

ADVANTAGE - Reduces time required to retrieve array information and verify size. 
ABSTRACTED -PUB -NO: 

WO 9727544A 
EQUIVALENT-ABSTRACTS : 

The array bounds checking apparatus includes an associative memory element which is 
configured to store and retrieve array size values associated with array access 
instructions. Each array access instruction references a value of an element in the 
array. 

A comparison element is coupled to an output of the associative memory element. 
This comparison element compares a given maximum array size with the value of the 
referenced element so as to provide a maximum violation signal. A second comparison 
element compares a given minimum array size and the value of the referenced element 
so as to provide a minimum violation signal. 

ADVANTAGE - Reduces time required to retrieve array information and verify size. 
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File: USPT 



Oct 17, 1995 



US-PAT-NO: 5459798 

DOCUMENT- IDENTIFIER: US 5459798 A 

** See image for Certificate of Correction ** 

TITLE: System and method of pattern recognition employing a multiprocessing 
pipelined apparatus with private pattern memory 

DATE-ISSUED: October 17, 1995 



INVENTOR- INFORMATION : 
NAME 

Bailey; Delbert D. 
Dulong; Carole 



CITY 

Aptos 

Saratoga 



STATE 

CA 

CA 



ZIP CODE 



COUNTRY 



ASSIGNEE- INFORMATION: 
NAME 

Intel Corporation 



CITY 

Santa Clara 



STATE ZIP CODE 
CA 



COUNTRY 



TYPE CODE 
02 



APPL-NO: 08/ 060579 [PALM] 
DATE FILED: May 12, 1993 



PARENT -CASE: 

1. Related US Application The present invention is a continuation in part of 
application Ser. No. 08/034,678 filed Mar. 19, 1993, entitled, "A Memory Transfer 
Apparatus and Method Useful Within a Pattern Recognition System" and assigned to 
the assignee of the present invention. 

INT-CL: [06] G06 K 9/68 

US-CL-ISSUED: 382/218; 382/303, 364/231.8, 364/DIG.l, 364/926.8, 364/948.34, 
364/DIG.2 

US - CL - CURRENT : 382 / 218 ; 382 / 303 , 704 / 231 , 712 /24 

FIELD -OF -SEARCH: 382/13, 382/30, 382/33, 382/34, 382/41, 382/49, 364/229.2, 

364/231.8, 364/240.1, 364/240.2, 364/242.3, 364/243, 364/243.43, 364/243.44, 
364/243.45, 364/926.8, 364/948.34, 364/964 

PRIOR-ART-DISCLOSED : 

U.S. PATENT DOCUMENTS 
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PAT -NO 



ISSUE -DATE 



PATENTEE -NAME 
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□ 4395699 July 1983 Sternberg 382/41 

□ 4790026 December 1988 Gennery et al . 382/49 

□ 5014327 May 1991 Potter et al . 382/14 

□ 5111512 May 1992 Fan et al . 382/3 

□ 5226091 July 1993 Howell et al . 382/3 

□ 5265174 November 1993 Nakatsuka 382/38 

FOREIGN PATENT DOCUMENTS 



FOREIGN -PAT -NO PUBN-DATE COUNTRY US-CL 

0550865 July 1993 EP 382/13 

56122445 February 1983 JP 382/34 

62-117744 November 1988 JP 382/34 

OTHER PUBLICATIONS 

English Translation of Japanese Kokai 58-24975 (to Komiya, publ . Feb. 1983). 
English Translation of Japanese Kokai 63-282586 (to Minewaki, publ. Nov. 1988). 

ART-UNIT: 266 

PRIMARY-EXAMINER: Boudreau; Leo H. 
ASSISTANT-EXAMINER: Johns ; Andrew W. 

ATTY-AGENT-FIRM: Blakely, Sokoloff, Taylor & Zafman 
ABSTRACT : 

A computer implemented apparatus and method of pattern recognition utilizing a 
pattern recognition engine coupled with a general purpose computer system. The 
present invention system provides increased accuracy and performance in handwriting 
and voice recognition systems and may interface with general purpose computer 
systems. A pattern recognition engine is provided within the present invention that 
contains five pipelines which operate in parallel and are specially optimized for 
Dynamic Time Warping and Hidden Markov Models procedures for pattern recognition, 
especially handwriting recognition. These pipelines comprise two arithmetic 
pipelines, one control pipeline and two pointer pipelines. Further, a private 
memory is associated with each pattern recognition engine for library storage of 
reference or prototype patterns. Recognition procedures are partitioned across a 
CPU and the pattern recognition engine. Use of a private memory allows quick access 
of the library patterns without impeding the performance of programs operating on 
the main CPU or the host bus. Communication between the CPU and the pattern 
recognition engine is accomplished over the host bus. 

50 Claims, 18 Drawing figures 
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TITLE: System and method of pattern recognition employing a multiprocessing 
pipelined apparatus with private pattern memory 



Detailed Description Text (33) : 

The pertinent components of the pattern recognition engine 525 (programmable 
multiprocessor ) are illustrated in FIG. 5, which except for the external memory 615 
are located within a single chip package. FIG. 5 also illustrates the communication 
bus architecture shared between the components of the PR engine 525. Each pattern 
recognition engine contains: a program memory 415, two data memories 30 and 32, a 
memory controller 419, a memory to memory transfer block 416, a VLIW execution 
block 417 and a system bus interface block 418. It is appreciated that any of the 
well known system bus interface technologies may be utilized within the present 
invention PR engine. The execution unit 430 is comprised of program memory block 
415, data memories 30 and 32 and VLIW execution block 417 as well as other elements 
to be described below. Interfaced to each PR engine 525 is a private memory block 
615 as discussed above. The system bus interface 418 is coupled to the ISA system 
bus 100. Each of the above blocks, where pertinent to the discussions of the 
present invention, will be described in greater detail to follow. It is appreciated 
that the program memory 415 may be loaded with the lower level procedures by the 
CPU 510 directing transfers from the disk 516 or RAM 512. 
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Abstract of the Disclosure A processor has a flexible architecture that efficiently 
handles computing applications having a range of instruction- level parallelism from 
a very low degree to a very high degree of instruction-level parallelism. The 
processor includes a plurality of processing units, an individual processing unit 
of the plurality of processing units including a multiple-instruction parallel 
execution path. For computing applications having a low degree of instruction-level 
parallelism, the processor includes control logic that controls the plurality of 
processing units to execute instructions mutually independently in a plurality of 
independent execution threads. For computing applications having a high degree of 
instruction-level parallelism, the processor further includes control logic that 
controls the plurality of processing units with a low thread synchronization to 
operate in combination using spatial software pipelining in the manner of a single 
wide-issue processor. The control logic in the processor alternatively controls the 
plurality of processing units to operate: (1) in a multiple-thread operation on the 
basis of a highly parallel structure including multiple independent parallel 
execution paths for executing in parallel across threads and a multiple -instruction 
parallel pathway within a thread, and (2) in a single- thread wide-issue operation 
on the basis of the highly parallel structure including multiple parallel execution 
paths with low level synchronization for executing the single wide -issue thread. 
The multiple independent parallel execution paths include functional units that 
execute an instruction set including special data-handling instructions that are 
advantageous in a multiple- thread environment. 



ABSTRACT: 
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Summary of Invention Paragraph : 

[0015] For applications with a high level of instruction-level parallelism, the 
VLIW processor executes a plurality of instructions in parallel on the plurality of 
independent processors using low thread synchronization overhead to operate with 
the same level of performance as an increased width VLIW processor. For example, in 
an illustrative embodiment, a VLIW processor includes two independent four-wide 
VLIW processors to selectively operate either as an eight-wide VLIW processor or 
two independent four-wide VLIW processors that mutually execute separate threads. 
Each of the independent processors includes a very rich set of functional units to 
form, in combination, a highly powerful processor, without the complexity of 
implementing the extensive control circuitry and connections of an eight -wide VLIW 
processor . 

Detail Description Paragraph : 

[0059] The illustrative VLIW processor 100 executes a plurality of instructions in 
parallel on the plurality of independent processors using low thread 
synchronization overhead to operate with the same level of performance as an 
increased width VLIW processor. For applications with a high level of instruction- 
level parallelism, the VLIW processor 100 utilizes the two independent four-wide 
VLIW processors, media processing units 110 and 112, to selectively operate either 
as an eight -wide VLIW processor or two independent four-wide VLIW processors that 
mutually execute separate threads. The independent media processing units 110 and 
112 each includes a very rich set of functional units to form, in combination, a 
highly powerful processor without the complexity of implementing the extensive 
control circuitry and connections of an eight -wide VLIW processor. 

Detail Description Paragraph : 

[0076] Referring to FIG. 5B, a schematic block diagram illustrates a logical view 
of a combination of two independent processors to form a wide-VLIW processor 100. 
Each of the illustrative VLIW processors includes an instruction buffer 214, a 
register file 216, and four functional units arranged in a group of three media 
functional units 220, and one general functional unit 222. The instruction buffer 
214 in each of the independent processors supplies up to four subinstructions to 
the register file 216. The up to four subinstructions execute on the media 
functional units 220 and the general functional unit 222. For applications with a 
high level of instruction-level parallelism, the eight-wide VLIW processor 100 
executes a plurality of instructions in parallel on the plurality of independent 
processors using low thread synchronization overhead to operate with the same level 
of performance as an increased width VLIW processor. The VLIW processor 100 
includes two independent four-wide VLIW processors 110 and 112 to selectively 
operate either as an eight -wide VLIW processor or two independent four-wide VLIW 
processors that mutually execute separate threads. The independent processors 
includes a very rich set of functional units to form, in combination, a highly 
powerful processor, without the complexity of implementing the extensive control 
circuitry and connections of the eight -wide VLIW processor. 
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ABSTRACTED- PUB -NO : US20010042190A 
BASIC -ABSTRACT: 

NOVELTY - The processor has a multi-ported register file (600) divided into 
register file segments (610,612,614,616). One of the segments is coupled to and 
associated with one of functional units. The segments in turn are divided into 
global and local registers that are accessible by functional units, and a 
functional unit associated with the register file segment containing the local 
registers, respectively. 

DETAILED DESCRIPTION - The register file is coupled to the decoder which decodes 
very long instruction word including several sub-instructions being allocated into 
positions of the instruction word. The local and global registers are addressed 
using register addresses in an address space that is defined for a register file 
segment/functional unit pair. An INDEPENDENT CLAIM is also included for operating 
method of very long instruction word processor. 

USE - Very long instruction word (VLIW) processor. 

ADVANTAGE - Provides split register which allows reduced size and improved 
performance through faster access. Wastage of registers is avoided in the 
processor, by supporting individual marking of registers. 

DESCRIPTION OF DRAWING (S) - The figure shows the schematic block diagram of 
register file for VLIW processor. 

Multi-ported register file 600 

Register file segments 610,612,614,616 

ABSTRACTED -PUB -NO: WO 200033178A 
EQUIVALENT -ABSTRACTS : 

NOVELTY - The processor has a mult i -ported register file (600) divided into 
register file segments (610,612,614,616). One of the segments is coupled to and 
associated with one of functional units. The segments in turn are divided into 
global and local registers that are accessible by functional units, and a 
functional unit associated with the register file segment containing the local 
registers, respectively. 

DETAILED DESCRIPTION - The register file is coupled to the decoder which decodes 
very long instruction word including several sub-instructions being allocated into 
positions of the instruction word. The local and global registers are addressed 
using register addresses in an address space that is defined for a register file 
segment/functional unit pair. An INDEPENDENT CLAIM is also included for operating 
method of very long instruction word processor. 

USE - Very long instruction word (VLIW) processor. 

ADVANTAGE - Provides split register which allows reduced size and improved 
performance through faster access. Wastage of registers is avoided in the 
processor, by supporting individual marking of registers. 
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DESCRIPTION OF DRAWING (S) - The figure shows the schematic block diagram of 
register file for VLIW processor. 

Multi-ported register file 600 

Register file segments 610,612,614,616 
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Abstract of US2001042190 

A Very Long Instruction Word (VLIW) processor 
having a plurality of functional units includes a 
multi-ported register file that is divided into a plurality 
of separate register file segments, each of the register 
file segments being associated to one of the plurality 
of functional units. The register file segments are 
partitioned into local registers and global registers. 
The global registers are read and written by all 
functional units. The local registers are read and 
written only by a functional unit associated with a 
particular register file segment. The local registers 
and global registers are addressed using register 
addresses in an address space that is separately 
°| defined for a register file segment/functional unit pair. 
The global registers are addressed within a selected 
global register range using the same register 
addresses for the plurality of register file 
segment/functional unit pairs. The local registers in a 
register file segment are addressed using register 
addresses in a local register range outside the global 
register range that are assigned within a single 
register file segment/functional unit pair. Register 
addresses in the local register range are the same for 
the plurality of register file segment/functional unit 
pairs and address registers locally within a register file 
segment/functional unit pair 
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